StereoGene: rapid estimation of genome-wide correlation of continuous or interval feature data

نویسندگان

  • Elena D. Stavrovskaya
  • Tejasvi Niranjan
  • Elana J. Fertig
  • Sarah J. Wheelan
  • Alexander V. Favorov
  • Andrey A. Mironov
چکیده

Motivation Genomics features with similar genome-wide distributions are generally hypothesized to be functionally related, for example, colocalization of histones and transcription start sites indicate chromatin regulation of transcription factor activity. Therefore, statistical algorithms to perform spatial, genome-wide correlation among genomic features are required. Results Here, we propose a method, StereoGene, that rapidly estimates genome-wide correlation among pairs of genomic features. These features may represent high-throughput data mapped to reference genome or sets of genomic annotations in that reference genome. StereoGene enables correlation of continuous data directly, avoiding the data binarization and subsequent data loss. Correlations are computed among neighboring genomic positions using kernel correlation. Representing the correlation as a function of the genome position, StereoGene outputs the local correlation track as part of the analysis. StereoGene also accounts for confounders such as input DNA by partial correlation. We apply our method to numerous comparisons of ChIP-Seq datasets from the Human Epigenome Atlas and FANTOM CAGE to demonstrate its wide applicability. We observe the changes in the correlation between epigenomic features across developmental trajectories of several tissue types consistent with known biology and find a novel spatial correlation of CAGE clusters with donor splice sites and with poly(A) sites. These analyses provide examples for the broad applicability of StereoGene for regulatory genomics. Availability and implementation The StereoGene C ++ source code, program documentation, Galaxy integration scripts and examples are available from the project homepage http://stereogene.bioinf.fbb.msu.ru/. Contact [email protected]. Supplementary information Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Camera Pose Estimation in Unknown Environments using a Sequence of Wide-Baseline Monocular Images

In this paper, a feature-based technique for the camera pose estimation in a sequence of wide-baseline images has been proposed. Camera pose estimation is an important issue in many computer vision and robotics applications, such as, augmented reality and visual SLAM. The proposed method can track captured images taken by hand-held camera in room-sized workspaces with maximum scene depth of 3-4...

متن کامل

Bayes Interval Estimation on the Parameters of the Weibull Distribution for Complete and Censored Tests

A method for constructing confidence intervals on parameters of a continuous probability distribution is developed in this paper. The objective is to present a model for an uncertainty represented by parameters of a probability density function.  As an application, confidence intervals for the two parameters of the Weibull distribution along with their joint confidence interval are derived. The...

متن کامل

Sub-optimal Estimation of HIV Time-delay Model using State-Dependent Impulsive Observer with Time-varying Impulse Interval: Application to Continuous-time and Impulsive Inputs

Human Immunodeficiency Virus (HIV) weakens the immune system in confronting various diseases by attacking to CD4+T cells. In modeling HIV behavior, the number of CD4+T cells is considered as the output. But, continuous-time measurement of these cells is not possible in practice, and the measurement is only available at variable intervals that are several times bigger than sampling time. In this...

متن کامل

Improvement of effort estimation accuracy in software projects using a feature selection approach

In recent years, utilization of feature selection techniques has become an essential requirement for processing and model construction in different scientific areas. In the field of software project effort estimation, the need to apply dimensionality reduction and feature selection methods has become an inevitable demand. The high volumes of data, costs, and time necessary for gathering data , ...

متن کامل

O-36: Genome Haplotyping and Detection of Meiotic Homologous Recombination Sites in Single Cells, A Generic Method for Preimplantation Genetic Diagnosis

Background: Haplotyping is invaluable not only to identify genetic variants underlying a disease or trait, but also to study evolution and population history as well as meiotic and mitotic recombination processes. Current genome-wide haplotyping methods rely on genomic DNA that is extracted from a large number of cells. Thus far random allele drop out and preferential amplification artifacts of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Bioinformatics

دوره 33 20  شماره 

صفحات  -

تاریخ انتشار 2017